NEYMAN'S C(a) TEST FOR UNOBSERVED HETEROGENEITY 



JIAYING GU 

UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 

Abstract. A unified framework is proposed for tests of unobserved heterogeneity in para- 
metric statistic models based on Neyman's C(a) approach. Such tests are irregular in the 
sense that the first order derivative of the log likelihood with respect to the heterogeneity 
parameter is identically zero, and consequently the conventional Fisher information about 
the parameter is zero. Nevertheless, local asymptotic optimality of the C(a) tests can be 
established via LeCam's differentiability in quadratic mean approach. This leads to local 
alternatives of order n^^''''. Many such tests are already familiar from existing literature, 
but the new framework reveals that certain regularity conditions commonly employed in 
earlier developments are unnecessary. 



1. Introduction 

Neyman's (1959) C(cx) test can be viewed as a generalization of Rao's (1948) score test in 
the presence of nuisance parameters and thus provides a unified framework for parametric 
statistical inference. We will sec that many of the existing tests for neglected parameter 
heterogeneity can also be formulated as C(a) tests and share common features. However, 
for these tests the usual score function is identically zero under the null hypothesis, and 
conventional Fisher information is thus zero. Fortunately, in these cases the second deriva- 
tive of the log likelihood is non-dcgcncrate and approximations based on it can be used to 
form a modified version of LcCam's differentiability in quadratic mean (DQM) condition. 
Local asymptotic normality (LAN) theory, then leads to local asymptotic optimality results 
for the C(a) test in such settings under local alternatives of order n~^/^. 

We focus initially on the case of a scalar heterogeneity parameter; extensions to multi- 
variate settings are briefly described at the end of Section 2. In Section 3 we consider three 
different examples and show that the C(a) test leads to familiar test statistics proposed in 
the econometric literature for different parametric models. The C(a) tests for parameter 
heterogeneity in Poisson regression model under two slightly different alternative specifi- 
cations lead to those in Lee (1986). Kiefer (1984) and Lancaster (1985) derives test for 
parametric heterogeneity in Cox proportional hazard model which can both be formulated 
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as C(a) tests. We also construct a C(a) joint test in Gaussian panel data model for het- 
erogenous location and scale parameter. 

The C ( a) test for heterogeneity formulated in this paper is very similar to the setup used 
in some previous development. In a seminal paper, Chesher (1984) points out the score test 
for unobserved parametric heterogeneity is identical to White's (1982) Information Matrix 
(IM) test. Cox (1983) obtains similar results using a more general mixture model. These 
papers can be viewed as important further development to a somewhat neglected example 
on testing for parameter heterogeneity in a Poisson model in Ncyman and Scott (1966). 
Moran (1973) investigates the asymptotic behavior of these score tests. However, as we 
will show in Section 4, the parameterization adopted in Moran (1973) and also Chesher 
(1984) requires unnecessary additional assumptions, even though it delivers the same test 
statistics as the C(a) test constructed here. We conclude in Section 4 that the C(a) test 
for unobserved heterogeneity is not always identical to the IM test, and illustrate some 
conditions for equivalence to hold. 



Neyman (1959) introduces the C(a) test with the consideration that hypotheses testing 
problems in applied research often involve several nuisance parameters. In these composite 
testing problems, most powerful tests do not exist, motivating search for an optimal test 
procedure that yields the highest power among the class of tests obtaining the same size. 
The locally asymptotically optimal C(a) test employs regularity conditions inherited from 
the conditions used by Cramer (1946) for showing consistency of MLE and some further 
restrictions on the testing function to allow for replacing the unknown nuisance parameters 
by its -y/n-consistent estimators. It is the confluence of these Cramer conditions and the 
maintained significance level a that gives the name to the C(a) test. 

2.1. C(a) test in regular cases. In regular cases, where all the score functions with 
respect to parameters in the model are non-degenerate and the Fisher information matrix 
is non-singular, the C(a) test is constructed as follows. Suppose we have Xi,...,Xti as 
i.i.d. random variables with density p(x;f„9] where 6 are nuisance parameters belonging 
to C and are parameters under test that belong to E] C M''. For densities satisfing 
the regularity conditions (Neyman (1959, Definition 3)), we consider testing the hypothesis 
Ho : f, = f,o against Hq : £, G H \ {f,o} while nuisance parameters 9 G 8 are left unspecified. 
We define the conventional score functions as 



2. The C(a) test for unobserved parameter heterogeneity 





and define the corresponding matrix of second-order derivatives. 
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as its Fisher information covariance matrix. 

Since nuisance parameters 6 are left unspecified by Hq, Neyman (1959) shows that for the 
test statistic to have the same asymptotic behavior when we replace the nuisance parameters 
9 by any -y/n-consistent estimator 0^, it is necessary and sufhcient for the test statistics to 
be orthogonal to Cg^n- For example, the "residual" score, which constitutes the vector of 
projecting C£,^n onto the space spanned by the score vector Ce,n> denoted by 

gn(0) = Cf,,TL — If,eIeeCe,TL, 

provides such a test function with variance If_.e = l^^ — l£,eleele£,- Given a -^/n-consistent 
estimator 0^ for 0, the C(a] test 

T^^(0n) = gn(0n)^l^.egn(0n] 

is then asymptotically Xq under Hq and is optimal for local alternatives of the form 
= ^0 + 5/-v/rL- When 0ti is the restricted maximum likelihood estimator of 0, Ce^n, 
is zero and the C(a) test reduces to Rao's score test. The component If,elgglef, subtracted 
from the information l^^^ for £, measures the amount of information lost due to not knowing 
the nuisance parameters (see e.g. Bickel, Klaassen, Ritov, and Wellner (1993), section 2.4). 



2.2. Testing for unobserved parameter heterogeneity. The C(a) test for unobserved 
heterogeneity is usually formulated under a random parameter model. Following Neyman 
and Scott (1966) we will focus initially on testing homogeneity of a scalar parameter against 
the alternative that the parameter is random. Consider having i.i.d. random variables 
Xi, . . . ,XrL, with each Xi having density function p(x; A-t). Heterogeneity of the model is 
introduced by regarding the individual specific A| as a random parameter of the form. 

At = Ao + T£,Ui, 

where the unobserved Ut's are independent random variables with common distribution 
function, F, satisfying moment conditions E(U) = 0, V(U) = 1. The parameter T is a 
known finite scale parameter. It is not restrictive to assume t known, as we will see later 
that T does not enter the test statistics. The hypothesis we would like to test is Hq : ^, = 0, 
which implies Ai = Aq for all I's. The alternative hypothesis is Hq : 7^ 0. 



Under the above setup, the standard C(a) test described in section 2.1 breaks down 



because the score function for £, for each individual observation Xi, defined as the first order 
logarithmic derivative of the density function with respect to is identically zero under 
the null, hence the Fisher information is also zero. 



9^ 



log 



p(xi; Ao + T£,u)dF(u) 



udF(u 



p'(xi;Ao) 
P(xi; Ao) 



0. 



However, Neyman (1959, p. 224, Corollary 2) was already aware of this possibility and 
suggested computing the second-order derivative, denoted as Si(Ao] below. 



Si(Ao 



p(xi; Ao + T£,u)dF(u) 



l£,=0 



u^dFfu 



2P"(X- 



p(xi;Ao 
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The normed sum of these independent second-order derivatives, s(Ao) = Si(Ao), can 

be shown to be asymptoticahy normally distributed with mean zero and variance E(sf (Aq)) 
under Hq by the central limit theorem. This leads to a close analogy with the regular the- 
orem, in which s(Ao) acts as the score function and the variance E(sf (Aq)) plays the role of 
the Fisher information in the irregular setting considered here. 

In regular cases, score tests exploit the fact that if the null hypothesis is false, the first- 
order gradient of the log likelihood should not be close to zero. Apparently this fails in the 
irregular case, because no matter how data is generated, the gradient is always zero. It is 
natural then to make use of the curvature information (i.e. the second-order condition), 
provided by the second-order derivative for inference. If the null is false, one expects the 
second-order derivative to be positive. We will see that this second-order score function plays 
the essential role of constructing the C(a] test for unobserved heterogeneity. The positivity 
condition also anticipates the C(a) test to be one-sided. The goal of the remaining part 
of this section is to show that the optimality of the C(a) test, as in the regular case, is 
still preserved under this irregularity and its asymptotic theory, although different from the 
regular cases in certain perspectives, still takes a simple form. 

2.3. Asymptotic optimality of the C(a) test for parameter heterogeneity. Under 
the irregularity discussed above, in order to establish the optimality of the test statistics 
based on the second-order score function, one could consider modifying the Cramer type 
regularity conditions in Neyman (1959, Definition 3), requiring the density function to 
be five times differentiable and impose a Lipschitz condition on the fifth order derivative 
with respect to the parameter under test. The main motivation is to obtain a quadratic 
approximation of the log likelihood ratio using the second-order score function through a 
higher order Taylor expansion. To be more specific, using the example in Section 2.2 as an 
illustration, for local alternatives Ai = Aq + T£,nUi, with £,n be a sequence that converges 
to zero at certain rate, we have the following Taylor expansion of the log likelihood ratio. 

An = Li log = Zi Si(Ao) + ¥e(U3) ^. ZMZiiiM 



^ 4! 



MU^) Li ^fe';' - 3IE(U^)^ Li s?(Ao)] + opdT- 



Let E,n be of order n^^/^ and provided the third and fourth moments of U are finite in ad- 
dition to the zero mean and unit variance assumption, we obtain a quadratic approximation 
of the log-likelihood. More details of such regularity conditions can be found in Rotnitzky, 
Cox, Bottai, and Robins (2000), in which they consider the maximum likelihood estimation 
of £, in the irregular cases in a very general context. 

An alternative formulation, rooted in LeCam's local asymptotic normality (LAN) the- 
ory, can be based on his differentiability in quadratic mean (DQM) condition. The latter 
condition is less stringent in regular cases: while Cramer conditions assume the density to 
be three times differentiable and impose a Lipschitz condition on the third order derivative, 
the DQM condition only requires first order differentiability and the derivative to be square 
integrable in £2 space. Pollard (1997) provides a nice discussion of the DQM condition in 



NEYMAN'S C(a) TEST FOR UNOBSERVED HETEROGENEITY 



5 



these regular cases. This is the approach we take for analyzing the asymptotic behavior of 
the C(a) test for heterogeneity. We will show below that by modifying the DQM condition 
slightly, we can obtain the local asymptotic normality of the log-likelihood ratio and estab- 
lish the asymptotic optimality of the C(a) test for the irregular cases under assumptions 
weaker than those suggested by the classical Neyman's approach. 

Suppose we have a random sample (Xi,...,Xru) with density function p{x;£„0) with 
respect to some measure \i. The joint distribution of this i.i.d. random sample will be 
denoted as Pn,B„e, which is the product of n copies of the marginal distribution P(x; f,, 9). 

Assumption 1. The density function p satisfies the following conditions: 

(1) £,0 is an interior point of E! 

(2) For all 9 G C and £, G E C M, the density is twice continuously differentiable 
with respect to f, and once continuously differentiable with respect to for all x. 

(3) Denoting the first two derivatives of the density with respect to f, evaluated under 
the null as V£,p(x; f,o, 9) and V|p(x; f,o, 9), we have P (V£,p(x; f,o, 9) = 0) = 1 and 
P(V|p(x;^o,9) ^0) >0. 

(4) Denoting the derivative of the density with respect to 9 evaluated under the null as 
V0p(x; £,0, 9), for any p-dimensional vector a, P (V|p(x; £,o, 9] 7^ a'''Vep(x; £,0, 9)) > 
0. 

Remark. Here £, is the parameter under test and 9 is the vector of nuisance parameters. 
The list of regularity conditions in Assumption 1 tailors the standard conditions for a regular 
C(a] test to the heterogeneity test wc consider here. In particular, condition (3) reflects 
the irregularity of these tests that the first order logarithmic derivative with respect to f, 
vanishes but the second-order derivative is non-vanishing. Condition (2) secures existence 
of the respective derivatives. Condition (4) rules out the case where there is a perfect linear 
relationship between the second-order score for £, and the score for 9. It ensures the new 
Fisher information thus defined to be non-singular and the C(a] test statistics to be non- 
degenerate. 

Under Assumption 1, we can now define the modified DQM condition that is crucial for 
establishing the local asymptotic normality of the model. 

Definition 1. The density p(x;£„9) satisfies the modified differentiability in quadratic 
mean condition at (f,o, 9) if there exists a vector v(x) = (v£,(x), Vg (x))^ G £'2[\i.) such that 
as (£,n,9n) ^ (£,0,9), 

lVp(x;£,n,9n)- Vvi^; ^.0,0) - h^v(x)|2d^(x) = o(||h^|P) 

where hn = ((£,ti — £,0)^, (0n ~ Q)"'"]'''- Here || • || denotes the Euclidean norm and 'C2{[i) 
denotes the £2 space of square integrable functions with respect to measure \i. 

Furthermore, let |3(Hn) be the mass of the part of p(x; £,n., 9^) that is p(x; £,0, 9)-singular, 
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then as (f,u,0u) ^ (f.0,0), 



|3(V 

llHnIP 







Usually the vector v(x) contains derivatives of the square root of density -(/p(x; £,n, 9n) 
with respect to each parameters evaluated under their null value. Definition 1 modifies the 
classical DQM condition such that whenever the first order derivative is identically zero for 
certain parameters, it is differentiated again until it is nonvanishing. The corresponding 
terms in Hn also need to be raised to the same power. For the heterogeneity test, the 
score function with respect to £, is of second order and its associated term in Hn is hence 
quadratic. This further implies that the contiguous alternatives must be 0(n^^/^). For the 
following theorems, we will thus focus on the sequence of local models on (Xi, . . . , Xn.) with 
joint distribution ?rL,B,n,en which E,n = + bin~^^^ and Gn. = 9 + 62Ti~^/^. 



Theorem 1. Suppose (Xi,...,Xti) are i.i.d. random variables with joint distribution 
Pu,£,n,en the density satisfies Assumption 1 and the modified DQM condition with 



r, f f , Tf /^l V|p(x; ^0,6) 1 V9P(X; £,0,9)^ ^ 

v(x) = (v,(x),ve (X)) = ^i^^|^^lM.;.o,e)>oi, ^^/^^r^^ 



(x;£,o,0)>O] 



then for fixed 6i and 62, the log-likelihood ratio has the following quadratic approximation 
under the null: 

dPTi,£.n,en _.Tc 1 



An = log 



dP 



where t = (5?,6j)"^, 

Sn = (S£„n5 n) 



n,£,o,e 



t'Sn-^t^Jt + Op(l) 



Vf,(,Xl 



Vq X; 



and 



J = 4 



(w^)d^L(x) 



/ E(S|^^) Cov(S^,,n,Sj_,: 

VCov(Sf,,n,Se,n) E(Se,nSjn) 



Jee, Jee 



Corollary 1. With Sn and J defined as in Theorem 1, we have 

and hence the sequence of models Pn,£,n,en locally asymptotically normal (LAN) at (£,0, 9) 
with Sn being interpreted as the score vector and J as the associated Fisher information 
matrix. Furthermore, Pn,£,n,en is mutually contiguous to Pn,£,o,e- 
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Theorem 1 shows that under Assumption 1, the modified DQM condition is sufficient 
for obtaining a quadratic approximation of the log-likelihood ratio for the sequence of local 
models in the n^^^^ neighborhood of the null value f,o and the n^^^^ neighborhood of the 
nuisance parameter 9. The joint normality of the vector Sn, as established in Corollary 1, 
further indicates the LAN property of this sequence of models. It is important to note that 
the vector Sn, in which the degenerately zero first-order score function for f, is replaced 
by the corresponding second-order derivative of the log-likelihood, acts as the score vector 
in this irregular case. Naturally, J has the interpretation of the Fisher information matrix. 
Under Assumption 1, since we rule out perfect dependence between S^^-tl Se,Ti. in con- 
dition (4), J is non-singular. 

Having established the LAN property of this sequence of local models, we can now make 
use of LeCam's (1972) limit experiment theory to show that the C(a) test is locally asymp- 
totically optimal. 

Following the definitions given in LeCam (1972) and van der Vaart (1998), an experiment 
£ indexed by a parameter set H is a collection of probability measures {Ph. : h G H} on the 
sample space {X,A). A sequence of experiments £ri = (^^n,-^!!, Pti,h : H G H) is said to 
converge to a limit experiment E = {X,A,Vh : h e H) if the likelihood ratio process for 
£n, 4w^i.^n), converges in distribution to the likelihood ratio of the limit experiment, 
(X), for H in a finite subset I C H and ho being the true null value. A common feature 
is that many sequence of experiments obtain a Gaussian limit experiment. One important 
example is that for i.i.d. sample from a smooth parametric model with distribution P^, if the 
sequence of the local model Pn,dn ™ which dn = -Sq + '''n5 with r^. as the appropriate norm- 
ing rate is locally asymptotically normal, then it has a Gaussian shift experiment as its limit. 

The advantage of establishing the limit experiment is several fold. First, the limit ex- 
periment is often easier to analyze than the original sequence of models. Second, the limit 
experiment provides a bound for the optimal estimation (in terms of lower bound on the as- 
ymptotic variance) or testing procedure (in terms of upper bound on the asymptotic power) 
one could achieve in the original model. Third, by van der Vaart's (1991) 's asymptotic rep- 
resentation theory, the optimal procedure found for the limit experiment can be matched 
to a sequence of statistics in the original experiment and they preserve the identical asymp- 
totic behavior. We will show below that the optimal test statistic found in the Gaussian 
shift limit experiment is matched by the C(a) test for the heterogeneity test problems. We 
will first focus on the scalar case, leaving the multi-dimensional case to a separate discussion. 

Theorem 2. Let 8,n be a sequence of experiments based on i.i.d. random variables 
(Xi,...,XrL) with joint distribution Pn,£,n,en the sample space (XTi,-An,). We further 
index the sequence of experiment by t = (6f , 6j]^ G M+ x W. The log- likelihood ratio of 
the sequence of models satisfies, 

l°gf^S'^l=t^Sn-it^Jt + op(l], 
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with the score vector Sn defined as in Theorem 1 converging in distribution under the null 
to 7sf(0, J). Then the sequence of experiments Sn converges to the limit experiment based 
on observing one sample from Y = t + v, where v ~ 3sf(0, J^^). The locally asymptotically 
optimal statistic for testing Ho:6i=Ovs. Ha:6i/0is 

Zn = {}£,£. - Jf,eJee Je£,)^^^^(S£„n - J^e JeeSe,n)- 

Corollary 2. Under Hq, Zn has distribution 3sf(0, 1). Under Hq, by applying LeCam's 
third lemma (see e.g. van der Vaart (1998, Example 6.7)), it follows a shifted normal 
distribution X(62(j^^ _ J^e Jee Jet)'^', l)- 

The optimal test statistic Zn takes the form of a C(a) test. It projects the second-order 
score S^^rt for £, onto the space spanned by the first-order score vector Sq^^ for 6. It is the 
sequence of statistics from the original experiment matched to the optimal test statistic in 
the limit Gaussian experiment for inference on 5i. 

One common feature of C(a) heterogeneity tests is that the limit distribution under local 
alternative is always a right-shifted normal distribution even if we have a two-sided alterna- 
tive hypothesis for 5i. This is not surprising given that the shift parameter corresponding 
to £, in the Gaussian limit experiment is a quadratic term 6^ G M+. In other words, the 
best inference procedure one could possibly achieve in the limit experiment is for 6^. We 
lose the sign information on 8i, and the asymptotically optimal test, if rejects the null, 
fails to distinguish whether the deviation is from the left or from the right (this phenome- 
non is also accentuated in Rotnitzky, Cox, Bottai, and Robins (2000)). The one-sidedness 
of the test implies that we reject Hq if (0 V Zn)^ > c, where c is the (1 — a)-quantile 
of 5X0 + 5X1 and Xo is a degenerate distribution with mass 1 at 0. The weight 1/2 asso- 
ciated with Xo is due to the fact that Zn takes negative value with probability 1/2 under Hq. 

There is another intuitive interpretation of the one-sidedness of the test, as we have al- 



ready anticipated in Section 2.2 The C(a) test Zn, constructed from the second-order score 
for £,, exploits information of the curvature of the log- likelihood function. Since at £, = £,0, 
the gradient of the log-likelihood function with respect to £, is always zero, it depends on the 
sign of the second-order derivative to determine whether the null point is a local maxima or 
a local minima. Only positive value of Zn indicates the null point as a local minima of the 
log-likelihood function, leading to a rejection of the null hypothesis. As n — )• 00, due to nor- 
mality of Zn, only half the time we get the "correct" curvature allowing us to reject the null. 

In our random parameter model, one could of course also consider a likelihood ratio test 
as an alternative testing strategy for heterogeneity. Among many others, Chen, Chen, and 
Kalbfleisch (2001) considers a modified likelihood ratio test for homogeneity in finite mix- 
ture models, which is very close to the setup we consider in this paper. They also obtain a 
mixture of asymptotics for their likelihood ratio test statistics. The modified LRT can 
be viewed as an asymptotically equivalent testing procedure in finite mixture models to the 
C(a) test considered here. The latter, however, inheriting the nice feature of the score test, 
is much easier to compute. Furthermore, the C(a] test statistics does not depend on the 
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specification of F as long as the moment conditions are satisfied. This can be viewed as a 
merit of the test because it has power for a large class of alternative models. On the other 
hand, it can also be viewed as its disadvantage because rejecting the hypothesis does not 
provide information on what plausible alternative might be. 

The result established thus far is not specialized to the heterogeneity test problem. It 
is applicable whenever the first-order score for the parameter under test vanishes but the 
second-order score is non-degenerate. There is another possible scenario for the score test to 
break down, in which none of the first-order score function is vanishing, but there is linear 
dependence among them, and thus the Fisher information matrix becomes singular. This 
is the case discussed in much details in Lee and Chesher (1986). Models with selection bias 
and the stochastic production frontier models fall into this class. They propose an extreme 
test which is based on the determinant of the matrix of the second-order derivatives of the 
log likelihood function and show the asymptotic optimality of the test. The extremum test 
can essentially be reformulated, using a reparameterization slightly different from what the 
authors suggested in the paper (i.e. choose k to be 1 in Lee and Chesher (1986, p. 132)), to 
fit into the conditions described in Assumption 1 . The similar irregularity also arises in test 
for symmetry in normal-skew distribution and is investigated in Hallen and Ley (2012). The 
reparameterization is a Gram-Schmidt orthogonalization in the same spirit of Rotnitzky, 
Cox, Bottai, and Robins (2000, Section 4.4)). The C(a) test can then be constructed and 
asymptotic optimality of the test follows. 



2.4. Replacing the nuisance parameter by a yrL-consistent estimator. Notice that 
the optimal test statistic Zn we obtained in Theorem 2 is a function of 0, to make the 
test statistic feasible under unknown nuisance parameters, we need to replace 9 by some 
estimator 9. In order to ensure that the asymptotics for the test statistic Zn in Corollary [2] 
is still valid, it suffices to show that Zn(9) — Zn(9) = Op(l) both under the null and local 
alternatives. There are various ways to obtain this result. The classical approach taken in 
Neyman (1959) was to make additional differentiability and bound conditions on the test 
function g(xt, 9), which is defined as 

such that Zn,(9) = qI'X-x-, 9). Details of these assumptions can be found in Neyman 

(1959, Definition 3 (ii) (iii)) and we will not replicate them here. When the conditions are 
satisfied, Taylor expansion of ZrL(9) around Ztt,(9) yields the desired results for 9 being any 
\/n-consistent estimator for 9. Neyman's assumptions are rather strong, for example, he 
requires the density to be three times differentiable with respect to 9 and also moments of 
the gradient of g with respect to 9 to be continuous. Another approach, using more modern 
probability theory, is to view the difference ZrL(9) — Zn.(9) as an empirical process. More 
precisely, we make the following assumption on the test function g(x, 9). 

Assumption 2. There exists some 6 > such that for any r|,r|' € U5(9) we have for some 
y > 

|g(x,Ti)-g(x,Ti')|^ ||ti-ti'PHW 
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for P^i^f^^^e-almost all x (for every n G N) where H is square integrable with respect to 
PTi,,f,n,e for all n G N, sup^T^Ep^j, gH^(X) < oo and additionally for some Cn = o(l), 

T^'/'lEp^,,n,9[H(X)I{H(X)>nV2e,}]=0(l). 

Theorem 3. Under Assumption [ij if is a -^/n-consistent estimator for 6, then 

|z^(e)-Zn(e)| = op(i) 

2.5. C(a) test for parameter heterogeneity in higher dimensions. It is of interest 
to generalize the C(a) tests of unobserved parameter heterogeneity to higher dimensions in 
the irregularity case. For example, under a Gaussian model, we may want to jointly test 
for heterogeneity in both location and scale parameters. The main challenge comes from 
the one-sidedness of the test. In higher dimensions, it is natural to look for analogues of a 
one-sided test. The limit experiment turns out to be multivariate Gaussian with location 
shifts in each coordinates towards the right tail. This requires us to look for optimal tests 
for deviations of the location parameters from zero restrictions to the positive orthant. 

To be more specific, suppose the limit multivariate Gaussian experiment has mean vec- 
tor ((j-i, . . . , (j.q), we would like to test Hq : Hi =0 for i = 1, . . . , q against the alternative 
Ha : Hi ^ for I = 1, . . . , q with at least one inequality holds strictly. This testing problem 
has been studied by several authors, particularly for the likelihood ratio test. Chernoff 
(1954) extends the classical Wilks's result on likelihood ratio test (LRT) to cases in which 
the null value of the parameters under test lie on the boundary of the parameter space. 
Hillier (1986) provides details for the LRT with dimension equal to three. Self and Liang 
(1987) give some further examples for LRT with nuisance parameters. Gourieroux, Holly, 
and Monfort (1982) considers testing problems in linear regression models with non- negative 
constraints on the regression coefficients. Test statistics for these one-sided test problems 
in multi-dimensions all obtain a mixture of with different degrees of freedom as their as- 
ymptotic distribution. The weights of these x^'s get complicated very quickly as dimension 
increases. We will present in detail the joint test for heterogeneity in two parameters as an 
illustration and comment on the more general case. Biilher and Puri (1966) extends the 
regular C(a) case to higher dimensions. The asymptotic distribution for multidimensional 
C(a) test for heterogeneity is different from that in Biilher and Puri (1966) due to the 
positivity constraints. 

Suppose again we have i.i.d. random sample (Xi, . . . ,Xn) with density p(x; £,,0). The 
parameters under test are now L, = E,2] £ C M? . They take null value £,o = (^.lo, ^,20) 
and 9 G @ G are the nuisance parameters. For heterogeneity tests in particular, we 
consider testing for heterogeneity of a vector of parameters, \, of the model. Under the 
alternative, they take the form, At)^ = 0k + T^^kUikj for k = 1, 2 and are independent 
random variables. Under Hq : ^,k = 0, so that A^'s are homogenous across individuals taking 
value 01c. 

The density function satisfies Assumption 1 such that the first-order score for L,i and £,2 
are vanishing but the second-order score is non- vanishing. It also satisfies the modified DQM 
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condition so that the model is locally asymptotically normal. We denote the second-order 
score for (f,i,f,2) as (Sf^^^^i) Sf^j.u) and the first-order score for 9 as Se,n- More specifically, 

1 V? p(Xi;£,o,0) 

under regularity conditions, they are S£.^,n = Li p''(xt;£,o,e) Mvi^i;^o,e]>o] and Se,n = 
^ ^^■pix-'^toQ] ^[p(xi;£,o,9)>o]- Let the associated information matrix be denoted as, 



J 



Jf,£, Jte 
Je£, Jee 



with being a 2 x 2 block matrix. The residual score for £,, similar to the scalar case, is 
found to be 



and the covariance matrix for Sf^ ^i is Z = — j£,eJee Je£, •= ^ v^^^ ' ^ 
Cholesky decomposition of matrix L. that 



where p = 0'i2/(v'viv'v2) is the correlation coefficient between S^^^^ri and S£_2,u- 



Theorem 4. Let Vn be the sequence of experiments based on i.i.d. random variable 
(Xi, . . . , Xn) with joint distribution ?n,u,Qn with £,n = (£,10, ^,20] + (§1, 62)".^^^'^ and Gn = 
9 + 6311" -"^/^ on the sample space [Xri,-^n)- The log-likelihood ratio of the sequence of 
experiment satisfies. 



log 



dP 



t^S 



n--t' Jt + 0p(l), 



dPn,£,o,e 

with Sn. = (Sf^^^Tu S£,2,Ti) Sj^)^ ~ 3Nf(0, J). Then the limit experiment of v-n is based on 
observing one sample from Y = t + v with t = [Sf, 82, 8j)~^ € x and v ~ N(0, J^"^]. 
We would like to jointly test Hq : 61 = 62 = against the alternative : 61 7^ or 62 7^ 0. 
Define = (win, W2n) as 



The optimal C(a) test statistic is one of the following four cases: 



if Win ^ 



In 



[pWin + a/1 - p2w2n)^ if 



if W2n ^ 0, WiTt ^ 



W2n,W2n ^ 



W2n ^ Win ^ 



fW2T 



W2n ^ 

if Win ^ 0,W2n ^ 



tWit 
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Under Hq, the asymptotic distribution of Tn. follows (| — 271^^0 + 5X1 + '^^^^ P = 

cos~^(p). 

Remark. The optimal test statistic Tn is constructed by sequential conditioning. We first 
find the residual score Sj^^ri for (^ij ^2) by conditioning on score for 9. LeCam's third lemma 
implies that asymptotically S^^n follows !N(0,Z) under Hq and [^((8^, 52)X, Z) under local 
alternatives. A further conditioning (See Rao (1973, p. 523)) breaks the multivariate normal 
for Sf^^ri into two orthogonal marginals. In particular, W2n is the conditional of Sf_2,n on 

As the dimension of parameter under test grows, the sequential conditioning argument 
described above implies that the optimal test statistics will still follow a mixture of 
distribution asymptotically under the null, albeit with more complex weights. If J happens 
to be a diagonal matrix, the weights take a very simple form. For f, G El C M*^ and let the 
residual score for f, be Sf^^ri with its covariance matrix as Lq. The diagonality of J implies 
diagonality of Lq. The optimal test statistic for Hq : £,1 = • • • = £,q = against Hq : f,i 7^ 
for at least one x is 

Tn = (OVS^,n)^IqHOVS^„n) 

Under Hq, T^ ~ ^^l^o (^)2-qx?- 

3. Examples 

In this section, we describe three examples of using the C(a) test for unobserved param- 
eter heterogeneity in various models. The first two examples lead to similar test statistics 
already familiar in the literature. The last example on jointly testing for location and scale 
parameter heterogeneity in Gaussian panel data model is new. Special cases of this example 
lead to test statistics similar to the test for individual effects in Breusch and Pagan (1980). 

3.1. Tests for overdispersion in Poisson Regression. Overdispersion tests for Poisson 
models constitute the most common example on test of parameter heterogeneity. Such a 
test was proposed in Fisher (1950) and also serves as the motivating example in Neyman 
and Scott (1966). We will consider two distinct versions of the test for heterogeneity of the 
intercept in the Poisson regression model. 

3.1.1. Second Moment Test. Suppose we have (Yi, . . . , Yn] as i.i.d. random variables follow 
Poisson distribution with mean parameter A^. We further assume that 

Ai = Aoie^^t = exp(xl|3 + £,Ui) 

where Ui are i.i.d. with distribution F, zero mean and unit variance. We have set T to be 1 
without loss of generality. The Xt's are covariates of the Poisson regression model including 
an intercept term. These covariates could be viewed as observed heterogeneity of the mean 
parameter, while Ui, since it is not explained by the covariates, is unobserved heterogene- 
ity. Thus, the intercept coefficient, |3o, given the assumed form for Ai, can be regarded as 
a random coefficient. We would like to test Hq : f, = against Hq : f, = with |3 as the 



NEYMAN'S C(a) TEST FOR UNOBSERVED HETEROGENEITY 



13 



unspecified nuisance parameters. Since the first-order score with respect to vanishes, this 
problem falls into the framework we considered in Section 2. 

Let f (tji; ^, (3) be the Poisson density function, with the respective second-order score for 
£, and the first-order score for |3, the residual score is defined as: 

g(yi, (3) = V| logf (yi; 0, |3) - a^Vp logf (yi; 0, |3) 

where a is the regression coefficients of projecting the score of on the space spanned by 
the score vector of (3. Under Hq, it is easy to see that a""" = [1, 0, . . . , 0] and hence 

giVi, P) = (yi - exp(x- 13])^ - exp(x- 13) - (yi - exp(x- 13)) 

and further, V(g(Y|, (3)) = 2exp(2x^|3). If the MLE (3, found by solving the normal equa- 
tions, ^t(yt — exp(X||3])xt = 0, is used as the -v/n-consistent estimator to replace |3, we 
have the locally optimal C(a) test statistic as 

y _ Li9(yiJ) _ Li[(yi-exp(x',|3))2-exp(x[P)] 



LiV(g(Yt,(3)) ^2L,exp(2x^(3) 

We call this the second moment test because Zn. is essentially comparing the sample second 
moment with the second moment for the Poisson model under Hq. 

Remark. The C(a) test constructed above is identical to the first test statistic proposed 
in Lee (1986) for over dispersion in Poisson regression models. It has also been discussed in 
Collings and Margolin (1985) and Cameron and Trivedi (1986), among many others, and can 
be viewed as an extension to Fisher's (1950) dispersion test for univariate Poisson models. 
In his derivation, Lee assumed that the Poisson mean parameter, Ai, follows a Gamma 
distribution with certain mean-variance ratio. The Poisson-Gamma compound distribution 
then leads to a negative binomial model. As Lee noted (p. 700), the same test statistic 
can also be derived under some other distribution in addition to the Gamma distribution 
(See also Dean and Lawless (1989)). From the C(a] perspective, the test statistic does not 
depend on the distribution of U, as long as the moment conditions are satisfied. However, 
it does depend on the particular specification on At as a function of the observed covariates 
and the unobservable Ui. This corresponds to the remark in Lee (1986) that if the mean- 
variance ratio for the Gamma distribution imposed on At is modified, one arrives at a 
different test statistic. He denotes it as the second factorial moment test. In the following 
example, we give the corresponding C(a) formulation. 



3.1.2. Second Factorial Moment Test. If instead, under the same setup as we have in 3.1.1 
we assume, 



Ai = Aoi [1 + EMi/V^i^ 
The residual score for £, is now found to be, with Agi = exp(X|(3), 

g(yi, |3) = [yi(yi - l) - 2Aoi{yi - Aoi) - Aq^] /Aoi 
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and V(g(Yi, p)) = 2. Replacing 3 by its restricted MLE P, the locally optimal C(a) test is 



This is called the second factorial moment test because Zn. is comparing the second sam- 
ple factorial moment with that induced by a Poisson model. Note that this test reduces to 
the second moment test if there are no covariates. 

Remark. Cox (1983) and Chesher (1984) provides a general framework of deriving local 

score test that has power against a general mixed Poisson alternative. Both approaches 
can be viewed as a C{a) test for a particular form of the random-parameter heterogeneity. 
Chesher (1984) also discusses an important link between the score test for heterogeneity 
and the Information Matrix test introduced by White (1982). For the two examples in the 
Poisson regression model, the Information Matrix test with respect to the intercept term 
is identical to the second moment test, but not to the second factorial moment test. This 
leads us to conclude that the C(a) test for heterogeneity is not in general identical to the 
Information Matrix test if we allow for covariates in the model. We will give a more general 
discussion on conditions for equivalence to hold between the two in Section 4. 

3.2. The Cox Proportional Hazard Model with Frailty. Introducing random effects 
into survival models is attractive because it is often implausible to make the assumption 
that individuals are homogenous even when the model includes covariates to control for the 
observed heterogeneity. Unobserved heterogeneity is often called frailty in survival models, 
as first proposed by Vaupel, Manton, and Stallard (1979). Survival models play an im- 
portant role in the economics literature, especially on research for unemployment duration. 
Lancaster and Nickell (1980) and Heckman and Singer (1982) demonstrate that neglect- 
ing time invariant unobserved individual heterogeneity in survival models, i.e. the Cox's 
proportional hazard model, can lead to wrong inferences on duration dependence. It is 
therefore of interest to develop a test for frailty in these models. 

In the proportional hazard model, without further assumptions on the cumulative hazard 
function or the distribution of the frailty term, it is impossible to distinguish the effects of 
duration dependence and unobserved individual heterogeneity. For this reason, researchers 
often impose rather strong parametric assumptions. For example, Lancaster and Nickell 
(1980) assume a WeibuU distribution for the baseline hazard and introduce a frailty term 
following Gamma distribution. However, estimation results are usually sensitive to these 
arbitrary assumptions. Fortunately, Fibers and Ridder (1982) shows that if the unobserved 
frailty term is multiplicative on the hazard and it has finite mean, with sufficient variation 
in covariates x, it is possible to nonparametrically identify the model. Heckman and Singer 
(1984) replace the finite mean assumption by a tail restriction on the frailty distribution. 
Honore (1993) further shows that the finite mean or the tail condition can be removed 
with multiple spell data if there is no lagged duration dependence and if the frailty is time 
invariant. 
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For simplicity of exposition, we work with single spell uncensored observations. Following 
the multiplicative on hazard assumption in the identification literature, we assume the 
individual conditional hazard function to be of the form, 

A(tt I Xi, Ui) = Ao(ti) cxp[x[^)vi = Ao(ti) exp(x[|3 + EMi], 

where Vi is the frailty term, further parametrized as exp(£,Ui) with Ut follows distribution 
F with zero mean and unit variance. The mean of the frailty Vi is approximately 1 for £, 
small. The baseline hazard function Ao(ti) is known up to a finite number of parameters 
and the Xi's denote the vector of covariates including an intercept term. We would like to 
evaluate the advisability of introducing the frailty term, that is, again, to test Hq : f, = 
against : £, 7^ 0. 

The individual survival function and the unconditional density can be written as 
S(ti I Xi) = Je-^<)(ti)cxp(x'iP+4ut)^P(^.^ 
f[ti I Xi) = jAo(ti)e'''iP+^^ie^^o(tt)exp(xip+tuO^P(^.^^ 

It is sometimes called a mixed proportional hazard model in the literature because f is a 
mixture density and F is the mixing distribution. With different assumption on the baseline 
hazard, we have the following two examples. 

3.2.1. Exponential Baseline Hazard. Assinnc that the baseline hazard follows exponential 
distribution with Ao{t) = t and Ao(t) = 1. The residual score for £, is, 

g(ti, P) = (1 -3tie<P +t?e2<P) + (1 - tiC^P). 

Replacing |3 by its MLEs, that ^^(1 — tt exp(x^|3))xi = 0, the optimal C(a) test is 

Ztx = ^ (1 - 3ti exp(x- P) + 1? exp(2x- P)) /\/4n 

i 

This is identical to the test proposed in Kicfer (1984), which uses the approach in Cox 
(1983) and derives the score test based on whether the variance of the frailty term is zero 
in local approximation of the mixture density. 

3.2.2. Weibull Baseline Hazard. With a slight modification, we can instead assume the 
baseline hazard as a Weibull model as done in Lancaster (1985) with unknown shape pa- 
rameter a that Ao(t) = cxf*^^. In this case, we have one more nuisance parameter a in 
addition to the coefficients p. Replacing the nuisance parameters by their respective MLEs, 
we find the C(a) test as 

2 ^ Li(l-3tfexp(x^,P)+tpexp(2x;p)) 
^ i/4n — 4n/q 

with q = l + \\>'[2] - (\K2))2 in which \\>'{z) is the tri gamma function and ^\>[z) is the 
digamma function. (Details in the Appendix B). Lancaster (1985) obtains the same test 
statistic using, again, the approach of Cox (1983). We will discuss in more details in Section 
4 the relation of the heterogeneity tests of Cox (1983) and Chesher (1984) to the C(a) test. 
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3.3. Joint test for location and scale heterogeneity in Gaussian panel data model. 

In this example, wc consider a two dimensional C(a) test for parameter heterogeneity in a 
Gaussian panel data model. The model is assumed to be 

Vit = Hi + cTieu 

with [j-i = |Xo + £,iUii and a? = aoexp(£,2U2i) ^ 0. The random variables Uici are i.i.d. 
with distribution Fi^ for k = 1, 2. Both Ui and U2 have zero mean and unit variance and 
are assumed to be independent. 

The unconditional density of observing (yti, ... ,1)17) is 

4^ {{f 1 V^'^ I v- (yit-|^0-^iuu)^\ , , ,^ , , 

fi = 7, — 2 Vt V exp - > — — dFi(uii)dF2(u2i) 

J J V27tagexp(£,2U2i)/ ^ ^ 2ag exp(f,2U2i) J 

The respective score for (£,1, £,2) and the nuisance parameters ((J-Q) O'o) ^-^^ 

Vu = V|^logf,k,=,,=o =[^?-^ 
V2i = logfik-, = t,=o = (Zi - I? - Zi 

V3i=V^„logf|k, = t,^0 =^%^ 

^^41 = V<,2 logfilt,=^,^o = (Zi - |)/ct^ 
where yi. is the sample mean defined as ^^=1 yu/T and 2Zi = Xt=i(TJit ~ Mo)^/o'o ~ Xj- 

Replacing the nuisance parameters by their MLEs, the optimal C(a) test for Hq : £,1 = 
£,2=0 against Hq : £,i 7^ for at least one i is: 

Tn = (0Vtin)' + (0Vt2n)' 

with 

ti, =(2NT(T-l)/6r4)-i/2(^.(M)2_MT^ 
t2n = (NT(T/2 + l))-i/2 (£. (z, - T/2)2 - ^) 
We reject Hq for Tn. > where 0^ is the (1 — a)-quantile of jXo + ^Xi + jxl- 

Remark. The first component tin of the test statistics may be recognized as the test for 
individual effect in Gaussian panel data model proposed by Breusch and Pagan (1980). The 
second component t2n is equivalent to a single parameter C(a) test for a Gamma model 
with heterogenous scale parameter. The factorization provided by the Gaussian model 
leads to simple asymptotics of the test statistics. Dependence between the random effects 
Ui and U2 would introduce more complicated weights for the mixture as alluded to 
earlier discussion at the end of Section 2. (Computational details in the Appendix B.) 

4. RePARAMETERIZATION and CONNECTION TO THE INFORMATION MATRIX TEST 

4.1. Reparameterization. A common strategy in prior literature to circumvent the ir- 
regularity, that the score function is degenerately zero, is to reparameterize the model. In 
fact, this is the advice given in the original Neyman's (1959) C(a) paper (Section 9, p. 225) 
and also in Cox and Hinkley (1974, p. 117-118). For the heterogeneity tests considered in 
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this paper in particular, Cox (1983) and Chesher (1984) adopt such a reparameterization 
by letting rj = £,o + — £,o)^. Reconsidering the example in Section [2^ without loss of 
generality, we set £,o = and have the density function as p(x;Ao + Tiyf[U|). Cox (1983) 
tests for heterogeneity of \i by testing Hq : r| = against Hi : rj > 0. Chesher (1984) takes 
the same model assuming follows a symmetric location-scale distribution. 

At first sight, reparameterization avoids the irregularity of having a degenerate score 
function. The first order derivative with respect to r|, albeit an undefined ^ function, can 
be evaluated by the I'Hopital's rule. As long as E(ll^) is non-zero, the score function is 
nonvanishing. The score test thus derived will be identical to the C(a) test using the orig- 
inal parameterization that At = Aq + T£,llt. However, the second order derivative for r\ is 
unbounded unless we impose an additional moment condition on U, that E(U^) = (See 
derivation in the Appendix C). This condition is implicitly satisfied in Chesher (1984) be- 
cause of the symmetry distribution assumption on U. Moran (1973) also employed this zero 
third moment condition and remarked that it was hard to rationalize. One explanation for 
this extra condition is that the original, more natural specification on the random parameter 
Ai = Ao+T£,Ut with £, G M is not equivalent to the reparameterization Ai = Ag-l-Ty^Ui with 
r\ G M+ unless U has a symmetric distribution. As we have seen, the £, parameterization 
has the advantage that no symmetry or higher moment conditions are necessary. 



4.2. Connection to the Information Matrix test. Chesher (1984) also points out that 
White's (1982) Information Matrix (IM) test is a score test for unobserved heterogeneity. 
Since Chesher (1984) can be viewed as a reparameterized C(a) test, it is of interest to in- 
vestigate the connection between the C(a) test for heterogeneity in general and the IM test. 



Take again the example in Section 2.2, Yi, . . . Yn are i.i.d. random variables each with 
density function p (y ; At ) . The parameter At is a random parameter and we assume it now 
takes a more general form At = Aq + £,k(Ao)Ut to incorporate both additive and multiplica- 
tive specifications. For example, if lc(Ao) = 1, we have the additive form At = Aq + ^Ut, 
while if k(Ao) = Aq, then the multiplicative form. The function lc(Ao) thus allows fiexible 
specification for the random parameter. 



For simplicity and to fix ideas, we first assume Aq is known. Theorem 1 then implies the 
following expansion of the log-likelihood function, provided that £,n = 0(n^^/^), 

I = 7 log [p(yt; At)dF(u) = )" logp(yt;Ao) + k^E(Uf ) >" k(Ao)2%^^^ + Op(l) 

The first order derivative of I with respect to £,n is zero evaluated under £,n = 0, and the 
second-order score is 

^Y2"'-|f,„=0 = /_ klAoJ — -— . 

9^n ^ p(yi;Ao) 

If Aq is unknown, we find the corresponding score for Aq and take the projection step to get 
the C(a) test. This is very close to the approximation in Cox (1983) except we allow for a 
more fiexible variance function for random parameter At, as £,^E(U?)l<.(Ao)'^. In a regression 
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model with covariates, Aq will then be a function of the covariates with coefficients (3. 

White's (1982) Information Matrix test under regression setting, on the other hand, is 
constructed based on the following moment conditions: 



E 



vech I V| logp(y; Ao(xi, (3)) + Vp logp(y; Ao(xi, (3))Vp logp(y; Ao(xi, (3)) 







where vech is the operator which stacks the elements in the lower triangular part of a 
symmetric matrix. By using the chain rule and focusing only on the moment condition 
for the intercept term |3o, the IM test statistic uses the following sample analogue of the 
moment condition 

TM V r V|p(t);Ao(xi,(3)) 2 Vxv[y;M^u^]) ^2 . o^l 

iM = > — - — — — — (VpoAo Xi, 3 j H — — Ao Xi, 3 

p(y;Ao(xi, |3)) ^ p(y; Ao(xi, (3)) '« J 

For the C(cx) test to be equivalent to the IM test, it is sufficient to have the following 
two identities: 

C(Vp„Ao(xt,P))2 = k(Ao(xt,|3))2 

2-i p(y;Ao(xt,(3)) V p^AolX^, p J - U 

where C is a non-zero constant. We give several examples below as illustrations. 



Example 4.1. Normal regression with ~ 3sf(|j,i,l), where pi = poi + £,k(poi)Lli and 
Pot =x-(3. 

Note that Vp(,poi = 1 and Vp^poi = 0, the IM test is equivalent to the C(cx] test if k(poi) = 
C 7^ and all nuisance parameters be replaced by their MLEs. Any other functional form 
of k(poi) leads to a test statistic that is different from the IM test. 

Example 4.2. Poisson regression with Yi ~ Poi(Ai), where Ai = Aqi + £,k(Aoi]Ui and 
Aoi = exp(x^|3). 

Now VpgAoi = Vp^Aoi = Aoi- If (3's are replaced by their MLEs, the second identify for 
equivalence holds because the normal equation for the MLE of (3o gives 

V- VAp(-y;Aoi) ^2 . V- VAp(ij;Aoi) ^ 

/ — — \ — ^ Aoi = 2_ — — ^; — ^Vp(,Aoi = U 

Therefore, the IM test is equivalent to the C(a) test if k(Aoi) = Aqi which is satisfied for 
the multiplicative alternative Ai = Aoi(l + ^,Ui). This specification is a first order linear 
approximation of the alternative form Ai = Agi exp(£,lli) for small which leads to the 



second moment test for the Poisson regression model as discussed in Section 3.1.1 



In summary, when the model contains covariates, the C(a) test is equivalent to the IM 
test only under a particular alternative specification, provided that the nuisance parameters 
are also replaced by their corresponding restricted MLEs. When the model does not contain 
covariates, IM test will be equivalent to the C(a) test because the function k(Ao) is no longer 
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individual specific and can be factored out as a constant from the score function. It will 

then be cancelled when we rescale the score by its standard deviation to form the C(a) test 
statistic. A similar conclusion is reached in several papers that observe that the score test 
for unobserved parameter heterogeneity is not always identical to the IM test. In particular, 
Cameron and Trivedi (1986) and Dean (1992) discusses the impact of different specification 
on the overdispersion test statistic for count data regression models. 



5. Conclusion 

We have shown that Neyman's C(a) test provides a unified approach to testing for ne- 
glected heterogeneity in parametric models. The irregularity encountered in these testing 
problems, that the score function is identically zero, can be circumvented by defining a 
second-order score function. Optimality of this new score function is established by for- 
mulating the problem in LeCam's LAN framework and examining the associated limit 
experiment. This framework provides neater regularity conditions in the irregular problem 
as compared to classical approach in Neyman (1959). 

The C(a) test inherits the chief merit of the score test, computation is made easy under 
the null model. In contrast, the likelihood ratio test, in face of the generally unknown 
heterogeneity distribution F, is computationally challenging. We have also seen that the 
C(a) test has local power against a wide class of alternatives, that allows us to avoid strict 
parametric assumptions on F, relying instead on weaker moment conditions. A further 
advantage of the LeCam framework is that it enables us to dispense with symmetry and 
higher order moment conditions that have been employed in earlier work. 

A straightforward generalization of the theorems in Section 2 would be to incorporate 
density functions that allow the first (m — 1) logarithmic derivatives to vanish. Rotnitzky, 
Cox, Bottai, and Robins (2000) also discuss estimation problems in this general case under 
classical MLE type of conditions. In such cases, we can define the m*'^ order derivative of 
the log density as the score function and require the Pitman-type local alternative to be of 
order n~^/^"^. LeCam's DQM condition needs to be modified by raising the corresponding 
elements in the expansion to m*^ power, as we did for m = 2 in Definition 1. It is curious 
to observe that only when m is an even integer is the test required to be one-sided. When m 
is odd, we can use reparamctcrization to transform the irregular problem back to a regular 
case, without imposing additional restrictions (i.e. symmetry of the distribution F). 

A drawback of the C(a) test, as reflected in Neyman (1979), is that asymptotic optimality 
of the test is only established under local alternatives. The approximation of the power 
function, which is characterized by the asymptotic behavior of the test statistics under 
such alternatives, relies on n tending to infinity and the parameter £,n converging to the 
null value L,q. The behavior of the power function for finite samples or fixed alternatives 
is largely unknown. It is also of interest to compare the power behavior of the C(a) test 
to the likelihood ratio test in these random coefficient models. We hope to pursue these 
investigations in future work. 
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Appendix A. Proof of theorems 

Before proceeding to the proof for Theorem 1, we first prove the following lemma as an 
adaption to Pollard (1997, Lemma 1). Denote fn = y^pfxt; f,Ti, Qn) and fo = y/pixi; £,o, 9). 
Let V£_ and ve be shorthand for V£_(xi) and ve(xi). Let || • || be £2(M-)-norm and (•, •) be the 
inner product. If it contains a vector, then it is defined as the vector of inner product for each 
elements. Further, let r-nixi, £,n, Qn) = "fn — fo— ^^■v(xi) and denote Ri = rn(xt, 9n)/fo- 

Lemma 1. Under Assumption 1 and the modified DQM condition, we have the following: 

(1) L,R? = Op(l) 

(2) E(v(X)/fo) = 

(3) 2£,R, = -itTjt + op(l) 

(4) £. R,v^/fo = op(l), ^. R-vg/fo = op(l) 

(5) max |Ri| = op(l) 

(6) max i^^| = Op(l), max |^^! = Op(l] 

Proof of ([T]). Under the modified DQM condition, the Markov inequality yields, 

P(£^R?>e) ^ e-2nE(R2) 

= e"VJr^(x; E^rx,Qn]d\i[x] 0. 
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Proof of ([2]) and ([s]). Since both and fo are objects with £2(M-)-iiorm 1 

=l|fnl|2,2-||folP^,2 

= (£,n - ^,o)'llvdP^,2 + (0n " 6]^ ||ve |P^,2 On " 6) + |ir^|P^,2 + 2 ((9^ - 0)Tve, tn) 

+2(eT^-e]^(fo,ve) +2(fo,rn) 

Note that by Cauchy-Schwarz inequality and the fact that both Vf^ and ve are square 
integrable with respect to measure [i by assumption, (v£^,rTT,) = o(l/\/ri) and (ve,rn) = 
o[l /^/ri). Therefore, the fourth to sixth terms are aU of order o(l/n). The seventh and 
eighth term are both of order 0(l/\/ri), so in order for the identity to hold, we must have 

(fo,vt) = (fo,ve) =0 

This proves ^ since = (fo,V£^) = E(v£^(X)/fo). Similar argument shows E(v0(X)/fo) = 0. 
Hence, 

2 (fo, Tn) = -(^n - ^onvdl 2 " (^n " ej^lKH^ 2(0n " 9) 

-2[U - ^o)'(en - e)^ (v^,V0) + o(l/n) 
= -4kt^Jt + o(l/n) 

with t^ = (5?,5j]. 

Since V(2 ^| Rt) is bounded above by 4 ^| E(R?), which goes to from ([l]), we have 

2lIiRi =2nE(Ri) + op(l) 
= 2n(fo,rn) + op(l) 
= 2n (-s^tTjt + o(l/n])+op(l) 
= -|tTjt + op(l) 

Proof of Q. By Holder's inequality. 

Similar argument admits the second result. 
Proof of ([5j). 

P( max |Ri| > e) ^ nP(|RiP > e^) ^ e"2uE(R- ) 

Proof of ([6j). 

P(^max |2v£,/fol > e^n) ^ nP(|2vf,/fol > e/a) 

^ e-'lE ((2vt(Xi)/fo)2) I[|2v,/fo|>ev^] ^ 

Similar argument admits the second statement. ■ 

Proof of Theorem |l|We consider £,n = £,o + 6in and Qn = 9 + 52rL ^'^^ throughout 
the proof. Under Assumption 1, we have the following Taylor expansion: 

fn = fo + (^n - ^ofvE, + (0^ - Q^Vq + r^{xi; 6^). 
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Denoting = 2(frL/fo ~ !)> we have 

wt = 2(£,n - £,0)'^ + 2(en - e)^^ + 2Rt. 

to to 

To show that under the modified DQM condition, the log-hkeUhood ratio admits a qua- 
dratic approximation, we use results in Lemma 1. 

The log-hkehhood ratio can be represented as 

A^ = Y_ log ^^^''^"'^"^ = ^ 2 log ^ = X 2 log(l + 



i i 

with (3(x) — > as X — > 0. 



2 

i 



Using (jsj) in Lemma 1 and with Sn = (Sf,,n) Sj^)"'^ and J defined in Theorem jlj we 

have 

^ — Vn. ^ — Tn ^Jn — fn ^ — 4 

1 1 t 1 

Using ([1]) and Q in Lemma 1, we have 

,2 /26£vt , 25jve 



= t^Jt + op(l) + 4L,R? + 4L,R,(M^ + ^^ 
= 1^11 + Op(l) 



Lastly, we need to show that ^|W?(3(wi) = Op(l). First note that using ([5| and ([6]) in 
Lemma 1, we have 



max |w|I > e ^ 5flt" I max 



2 vt. 



> e 1 + sir I max 

^ * l<i<n 



_2_ Ve 
\/ti f 



> e 



+2P ( max |Ri| > e ) ^ 



Since when Wi — )- 0, (3(wi) — > 0, we have max ||3(wt)| = Op(l). By Holder's inequality, 
5" w?|3(wt) ^ max ||3(wt)| )" w? = op(l]Op(l) = op(l). 

i i 

Therefore, the log-likelihood ratio is approximated by 

An =LiWi-iLtW? + i£,w2|3(wt) 

= tTSn-itTjt + Op(l) 
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Proof of Corollary 1 Since Sn is a normed iid sum, by the central limit theorem, 

The zero asymptotic mean of S-n, is provided by ^ in Lemma 1, then the asymptotic 
variance for Sn is J as defined in Theorem 1. 

The quadratic approximation for established in Theorem 1 together with the joint 
normality of Sn leads to the LAN property of the sequence of model P-n.,E,n,9n- Furthermore, 
we have 

An"'^^"" X(-Vjt,t^Jt). 

By LeCam's first lemma (see e.g. van der Vaart (1998, Lemma 6.4)), Pn,f,n,en Pn,£,o,e 
are mutually contiguous. ■ 

Proof of Theorem [2] The sequence of experiments £n converges to a shifted Gaussian 
[N'(t, J^^) as a result of Theorem 9.4 in van der Vaart (1998). The log-likelihood ratio process 
of observing one sample from N(t, J^^] is 

^ dN(0, J-i)^ ' ' 2 ' 



It suffices to show that J Sn converges to the distribution of Y under the null. Corollary 
[l| establishes Sn ^"-^"^ N(0, J), we thus have J-^Sn ^"-d"'" X (O, J-^). 



The optimal test statistic for Hq : 6i = against Hq : 5i 7^ in the limit experiment is 
the first element in Y. The sequence of test statistics from the original experiment £n that 
matches with the first element in Y is the C(a) statistic, 

Zn = (Jf.£, - j£,eJee Je£,)^^^^(S£„n - Jte JeeSe^n)- 

Notice the rescaling in Zn is needed to obtain a unit asymptotic variance for the test statis- 
tic. ■ 

Proof of Corollary 2 Since £, is a scalar and Sn '^'^ ^(0, J) under Hq, it is immediate 
that the asymptotic null distribution for Zn is 3Nf(0, 1). 

We can now use LeCam's third lemma (see e.g. van der Vaart (1998, Example 6.7)) to 
derive the asymptotic distribution for Zn under local alternatives. We are interested in the 
local alternative that £,n = £,o + Sifi^^^^ and nuisance parameter 9 is left unspecified as in 
the null, hence we set 82 = in the log- likelihood ratio expansion. Under Hq, 

with cri2 = Cov(Zn, An) = 5i(j£.£. — JteJeeJe^.)^^^- With 82 = 0, Corollary 1 implies that 
PrL,f,n,e mutually contiguous to Pn.£.o.ei then LeCam's third lemma implies, 

ZPn,£,n,e ^1-, -1 N 
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Proof of Theorem 3 Define the class of functions: 

:= {x ^ (g(x,e] - g(x,Ti))|||e -Till ^ 5n}. 

If 9 is a -y/n-consistent estimator of 0, and 8^ = 0(n^'^) with k < 1/2, we obtain that with 
probabihty tending to one 



Zn(e)-z, 



^ sup |Gn(f)l 

fe3-n 



where Gn(f) := n^i/^ ^^(f (Xi) - Ef(Xi)) denotes the empirical process indexed by 3'n- 
Proving Z-niQ) — ^ni^) = Op(l) thus amounts to establishing asymptotic equicontinuity of 
the process Gn with respect to the Euclidean norm. 

Let the parameter space near true 9, U5^(9), be covered by balls with radius e^^'*', 
the number of balls can be upper bounded by Cie^'^^^ with Ci as a constant that does 
not depend on n and p being the dimension of the nuisance parameter space. Then for 
Vri G U6^(9), 3Nti, such that 

11^1 - TIN, Ke'/^ 
The condition on g in Assumption [2] implies 

lg(x,ri) - g(x,riNjl ^ ||ri -riNjrH(x) ^ eH(x) 

It follows that the bracketing number, N[ ] (e||H||2, 3"th '^2(Pn,£,n,e)) is bounded from above 
by Cse-T/T'. 

Furthermore, the assumption also implies that for f G 3"^,, ||f||p^,2 ^ 5n||H||p^^2 with 
'^2(PTt,£,n,0)"iio'^™- We can now apply Theorem 2.14.2 in van der Vaart and Wellner (1996) 
and get 

Ep^ e( sup |Gn{f)|) ^ J[ ] [81, 3"n, £2(Pn,u,e))||H||p,,2+^/^^Ep^ , [H(X)1{H(X) > V^a[8l)}] 
where the bracketing integral is defined as 



J[](5j;,3-n,£2(Pn,u,e)) 



l + logN[](e||H||p^,2,^n,£2(Pn,tn,e))de 



and 



a(5^;) =5?;||H||2/^l + logNn(5?;||H||p^,2,3"n,£2(Pn,tn,e)). 
Provided that 5^ — >• 0, we have for n large enough, 

r5l 



J[](5j;,Jn,£2(Pn,U,e)) ^ 



l + log(C2e-T'/^)de ^ 



Since H(x) is square integrable for all n by Assumption [2| the first term goes to zero. 
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The upper bound for the bracketing number also yields a lower bound for a(6n) that is 
for 8n sufficiently small, 

CKY-i \ Sn||H||p 2 , ^ „ 

^l + log(C25n^) 

As long as kn converges to zero slower than Cn, Assumption [2] ensures that the second 
term also tends to zero. 

The last step is to check that sup -i= Y.i ^Pn ? e (^i)) = so that sup |Gn(f)| is the 

correct upper bound. This is trivially true under the null, where E^n = for all n G N, since 
Ep^ ^^ g(g(Xi,e)) = Ep^ ^^ g(g(Xi,e]) = 0. Under local alternatives with £,n = £,o + Sin-^/"^ 
and given the i.i.d. assumption on the sample, it suffices to show that 



sup -v/n 

lh-e||^6n 



(g(x,-n) - g(x,e))p(x; £,n,0)dx = o(i] 



Denote pn = p(x; E,-n, 6) and po = p(x; £,o, Q), we have the following expansion 
/n^ J(g(x,Ti) - g(x,e))pndx 

+ Vn(£,n - E^of j[g[x,r\) - g(x, e))^/p^V£,(x)dx 
+ /ri J(g(x,ri) - g(x, e))/p;^rndx 

The last two terms are o(l) uniformly over r\ for ||ri — 9|| ^ due to the DQM condition 
in Definition 1 and assumption on g in Assumption [2] Since Cauchy-Schwarz inequality 
implies that with respect to £2(l-i-)-iiorm, 

I J(g(x,r|) - g(x, e))^/p;^V£,(x)dx| ^ ||(g(x,r|) - g(x, e))^/p^^||^,2||v£,||^,2 

^ l|T^-e|n|H||p^,2||vdl^,2 = o(l). 

Similarly, 



/^Lj(g(x,ri) - g(x,e))^/p;;:rndx| ^ ||(g(x,ri) - g(x, e))^/p^|| |j,,2/a||rn|| ^,2 = o(l). 

The first term is also o(l) by expanding y/p^ again and applying Cauchy-Schwarz inequal- 
ity in a similar fashion. ■ 

Proof of Theorem |4] As in the proof of Theorem [2j the limit of the sequence Vn is 
a shifted Gaussian experiment Y ~ ?Nf(t, J^"*^) but now with t""" = (5|, 63, An equivalent 
limit experiment observes X ~ 3sf(t^J, J) with X = JY, because the likelihood ratio process 

°^ dX(oj-i) is identical to that of "^^jlf^fojj^ (X). 

To be more explicit, denoting the first two elements of X to be X^, and the rest to be 
Xq, we have under the alternative. 
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with t£, = (5f , 62) and te = 6j- 

To focus on testing for zero restrictions on tf_, we find the conditional distribution of X^, 
on Xe to be 

= X£, - JteJegXe = 3sr(t£,(Jf,£, - J^eJee Je£,), j£,f. - JteJee Jef,)- 
The matched statistic from the original experiment is then 

Sf„n = Sf,^rL — JteJeeSe.n 

Under Hq, Sj^^ti follows !N(0,1) with L = — J^^eJee^Q^.' ^"^^ under local alternative, its 
asymptotic distribution is [^(tf^I,!). 



Let the Cholesky decomposition of L be such that AA^ = Z, then A ^Sf^ 



and A-iS£„n K(ttA, I]. Since tj, = (6?, 5i) G and 



t.A 



The feasible parameter set is therefore the convex cone defined as, 



(Tli,r|2) \ ^2 > 0,Tii - 



For test statistic taking a value that falls outside of the feasible set, it needs to be projected 
onto the set. This yields the following four cases as illustrated in the figure. 

Case 1: When the value of the test statistic Wru falls into shaded area (T), the optimal 
test statistics is the sum of squares of the elements of Wn- 
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Case 2: When the test statistic fahs into area (2), we need to project onto the convex 
cone (T), which gives a point with coordinates (p^Win + p-\/l — p^W2n, P\/l — p^Win + (1 — 
P^)w2n)- The C(a) test statistic is hence: 

Tn = (p^Win + pi/l - p2w2TL)^ + (p\/l " P^Win + (1 - P^)W2n)^ 

= (pwm + a/1-P^W2u)^ ~ X? 

Case 3: When the test statistic Wu falls in area (3), projecting onto the region (T) yields 
(win, 0) and thus, 

Tlx = w?^ ~ X? 

Case 4: Lastly, when Wru falls into region (4), projecting onto region (T) yields (0, 0) and 
hence, 

Tu = 0~Xo 

The asymptotic distribution of the C(a) test statistics is a mixture of x^'s, for which the 
weights arc characterized by the probability of falling into different regions. The angle |3 
spanned by the shaded area (T) as marked in the figure is (3 = cos^^ (p), hence the probability 
of falling into region (D is The probability of falling into (2) and (3) is leaves the 
probability of falling into (4) as ( ^ — ^ ) . ■ 

Appendix B. Computational details in examples 

B.l. Cox Proportional Hazard model with frailty. The second-order score for £, takes 
the form, 

V|logf(tdxi)k=o 

^ jAo(tt)e'^tl^+^"te-^o'ti'^''^''''^"^[(l-Ao(tOe'''iP+^"M'u?-u?Ao(tt)e'^tP+^"t]dF(ut) | 

= E(U?)[(l-Ao(ti)e<P)2-Ao(ti)e<P] 
= 1 - 3Ao(ti)e<P + Ao(tO V'^tP 

Specializing to the exponential model, that Ao(ti) = t,, leaves us with no additional 
nuisance parameter in the baseline hazard function. The score function for p is, 

Vplogf(ti|xi)|^=o = (l-tie<P)xi. 

The regression coefficient in the residual score for £, is found to be = [—1, 0, . . . , 0], which 
leads to the residual score to be of the form, 

g(ti, P) = {l-3tie<^ +1^^^) + (1 - tie<P). 

Variance of g(Ti,|3) can be calculated easily by noting that under the null, E(q^) = 
r(k+ 1) where q = Ao(Ti)e'''iP. Since E(q'') = J q'^f(ti|xi)dti = Jq'^e-'idq = r(k+ 1). 
For the exponential model, V(g{Ti, =4. 

The computation for the WeibuU model is more involving because of the additional 
nuisance parameter a. The respective scores for all the parameters are 
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V|logf(ti|xi) = (i-tfe<P)2-tfe''^P 

Vplogf(ttlxt) =[l-t^e<^]xi 

V « log f ( ti |xt ) = ^ + log tt ( 1 - tf e< P ) 

Let 6 = ( (3 , a) , the information matrix for the nuisance parameters is 



XiX[ 



-Xi 



^H2)-Jc[l , l-i|>'(2]-2iP(2)x't|3 + (x't|3)" 
z. X- 



and the inverse of this matrix is 

q + (i|.(2)-x;(3) ^ 



q 



with q = l+^\)'[2) - (tK2))2. We further find, 



I 



-2-i^(2)+x;|3 



The regression coefficient in the residual score for £, is hence, 



a 



-q+2(x|.[2)-x;|3) 



-2(X 

q ' 



and YiZi 9(Ti, |3, a)) = V(V| logf) - helee^eE, = n(4 - 4/q). 

B.2. Joint test for Gaussian panel data model. The information matrix for (£,, 6) 
(£,l,£,2,^o,o•^) is 



Ie£. lee 



NT 



/2T a2 





1 \ 


a2 {T + 3)a4/2 















V 1 o-o/2 





1/2 y 



We further find 



T T T-iT _^2NT(T-l)/a4 





NT(T/2 + l) 



and 



2 
al 



As we have remarked in Section 2.5 the diagonality of l£„e provides much convenience 
for finding the optimal test statistics. Denote 

\t2nj VIIi^2i-cT2^^V4i; ^ (NT(T/2 + 1) )-i/2 . (z^ - T/2)2 - NT/2) / 

Replacing [\xq, cTq) by their MLEs yields the joint C(a) test. 
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Appendix C. Claim in Section 4 

Here we provide the detail derivation for the claim in Section 4 that the reparameteriza- 
tion adopted in Chesher (1984) and Cox (1983) for heterogeneity test requires extra moment 
conditions on U for second derivative of log density with respect to the test parameter to 
be bounded. 

Proposition 1. For iid random variable Yi, . . . , each with density function Jp(y; Aq + 
Ty^Ui)dF(ui), where Ut is a random variable with zero mean and unit variance. The 
second-order derivative of the log density with respect to r\ evaluated under r| = is un- 
bounded unless E(U3) = and E(U^) < oo. 



Proof Denote the log density as I = logJp(y;Ao + Ty^Ui)dF(ut). The first order 
derivative with respect to r\ is 



T^JVAp(y;Ao)udF(u) _ 2^V|p(y;Ao) 



2^Jp(y;Ao)dF(u) 
The last step is obtained by applying the I'Hopital's rule. 



2 p(y;Ao) 



The second order derivative is 



^ti''Iti=0 



T^\/nJVAP(Tj;^o)^^dF(it)-TjVAp(ij;Ao)udF(u) 
^riV^T Jp(y;Ao)dF(u 

12v^Jp(y;Ao)dF(u) 



(v.il|^=o)" 



11=0 



Provided that V^p('y; Aq) is not degenerately zero, V^l is unbounded unless E(U^) = and 
E(U^) < oo so that we can apply I'Hopital's rule again and get 

-4 r vtp(y;Ao) i2^2 Vip(y; Aq)" 



T 

12 



E(U^ 



p(y;Ao) 



-3E(U^)^ 



p(y;Ao] 



< oo 



